About

This storyboard presents the results of all the 12 case studies performed in the book, “Text Mining: An Uncharted Territory for Librarians”. The numbers 1A, 1B, 4B and so on, indicate the case study for that particular chapter and the letter represents the analysis from different tools, for instance, 1A shows the results from Chapter 1 for clustering using Orange tool and 1B shows the results from Chapter 1 for clustering using R. To know more about the case studies, and the methodology used to get the results, kindly read the book available on Springer.

1A

Heatmap showing distances between documents


The heatmap plot shows the distances between the documents.

Clustered Heatmap showing distances between documents


The clustered heatmap plot shows another way to visualize the distances between the documents.

Dendogram showing hierarchical clustering of documents


The dendogram presents the hierarichal clustering of documents using the ward method.

1B

Determine the number of K for clustering using Elbow Method


For clustering in R, elbow method was used to determine the number of clusters.

Visualizing distance matrices


Euclidean distance method was used to determine the distance between the documents.

Agglomerative hierarchical clustering


Hierarchical clustering with dendrograms is another way to visualise the distance between the documents.

Circular Dendogram


Circular dendogram is yet another way to visualise the distance between the documents.

Phylogenic Dendogram


Phylogenic structure is another way of visualizing the same results with different perspective according to your research problem and dataset.

4A

Topic Modeling using Topic-Modeling-Toolkit (TMT)

Timeline showing the core topics in DESIDOC Journal of Library and Information Technology from 1981 to 2018 (©2019 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2019))


50 core topics were identified that fitted the corpus of 928 DJLIT research articles wherein only 29 topics were identified as unique.

4B

Topic Modeling using RapidMiner

Latent Dirichlet Allocation Topic and Word Result for PQDT Global ETDs during 2014-2018 (©2020 Cadernos BAD, all rights reserved – reprinted under Creative Commons CC BY license, published in Lamba and Madhusudhan (2020) )


The results shows the topics assigned to the corpus of ETDs.

4C

Method 1: Plotting top words using stm package


The figure shows the results for 5 topics using Structural Topic Modeling (STM).

Method 2: Plotting MAP histogram using stm package


The figure shows second way of representing the results from Method 1.

Method 3: Visualizing topic model using ggplot2


The figure shows third way of representing the results from Method 1 and 2.

Method 4: Interactive Visualization


The figure shows fourth way of representing the results from Method 1, 2, and 3.

Understanding topics through top 5 representative documents


The Table presents the result for top five representative ETDs for the modeled topics and are ranked according to their probability.

Topic correlation


The figure shows correlation between the topics using a network graph.

5A

Network Text Analysis of Documents using Bibliometrix in R

Word Co-Occurrence Network


The figure presents the word co-occurrence network for top 50 words that represent the literature indexed in Web of Science (WoS) database on malaria disease for year 2019.

5B

Network Text Analysis of Documents using Textnets in R

Text Network


The figure represents the 22 clusters/communities of 238 words (nodes) which were determined from the network text analysis of the data.

6A

Burst Detection of Documents using Sci2

9

Predictive Modeling of Documents using RapidMiner

Screenshot of evaluation result (©2020 Cadernos BAD, all rights reserved – reprinted under Creative Commons CC BY license, published in Lamba and Madhusudhan (2020))


The screenshot shows the evaluation results for library science ETDs in the PQDT Global database.